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Abstract 

For a pure Horn Boolean function on n variables, we show that unless P = NP, it is not possible to 
approximate in polynomial time (in n) the minimum numbers of clauses and literals to within factors of 
20(io g < > n) even wnen the inputs are restricted to 3-CNFs with 0(n 1+e ) clauses, for some small e > 0. 
Furthermore, we show that unless the Exponential Time Hypothesis is false, it is not possible to obtain 
constant factor approximations for these problems even having sub-exponential time (in n) available. 

1 Introduction 

Horn functions constitute a rich and important subclass of Boolean functions as they usually present nice 
structural properties, occur in a myriad of applications, and the satisfiability problem (SAT) can be solved in 
linear time for Horn formulae (cf. Dowling and Gallier [15] , Itai and Makowsky [50], and Minoux [53]) while it 
is NP-hard for general Boolean formulae (e.g. Garey and Johnson [16]). This illustrates that the problem of 
finding short (according to some well defined measure of minimality) Horn CNF and 3-CNF representations 
of Horn functions has an intrinsic appeal stemming from both theoretical and practical standpoints. 

One can find in the literature that for several different measures of minimality, the problem of finding short 
Horn formulae is NP-hard: Maier [35], Ausiello et al. [5], Hammer and Kogan [IS], Boros and Cepek [7], and 
Boros, Cepek, and Kucera, "Finding a shortest representation of a pure Horn 3-CNF is hard" (unpublished). 

In this article, we focus on the hardness of approximating (i) the minimum number of clauses of pure 
Horn functions specified through pure Horn CNFs and (ii) the minimum numbers of clauses and literals of 
pure Horn functions specified through pure Horn 3-CNFs (i.e., when each clause has size at most three). 

The first partial result in this realm was provided in Bhattacharya et al. [B]. Specifically, it was shown 
that unless NP C QP, that is, every problem in NP can be solved in quasi-polynomial time (in the size of 
the input's representation), the minimum number of clauses of a pure Horn function on n variables specified 
through a pure Horn CNF formula cannot be approximated in polynomial time (in n) to within a factor of 
20(iog ri) ^ f or some small constant e > 0. Their result is based on a gap-preserving reduction from a fairly 
well known network design problem, namely, MinRep [21] and has two main components: a gadget that 
associates to every MinRep instance M a pure Horn CNF formula h and links the size of an optimal solution 
to M to the size of a clause minimum pure Horn CNF representation of h, and a gap amplification device 
that provides the referred gap. Despite being both necessary to accomplish the result, each component works 
in a rather independent way. 
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author also gratefully acknowledge the partial support by the joint CAPES (Brazil)/Fulbright (USA) fellowship process BEX- 
2387050/15061676. 
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We strengthen Bhattacharya et al. [5]'s result in the following ways. We start from a related but slightly 
different problem, called Label- Cover, and present a new gap-preserving reductiorQ to the problem of 
determining the minimum number of clauses in a pure Horn CNF representation of a pure Horn function. In 
arguing about the correctness of our reduction, we show that our gadget forms an exclusive component of the 
function in question and hence can be minimized separately (cf. Boros et al. |H|) from the gap amplification 
device (and it is then clear that the same principle underlies Bhattacharya et al.'s result). Our hardness of 
approximation factor is slightly larger and stands upon a weaker complexity hypothesis. 

The explicit independence of both parts and the fact that our gadget is somewhat more complicated 
(for MinRep is a "simplification" of Label-Cover) allows us to slightly modify our reduction and obtain 
a similar result for the case where the representation is restricted be 3-CNFs. We can then derive in a 
straight-forward fashion a hardness result on the minimum number of literals of pure Horn functions. We 
note these results cannot be obtained from Bhattacharya et al.'s gadget. Furthermore, as our reductions 
have the Label-Cover problem as starting point, using a stronger complexity hypothesis, we can show 
that even in the case where sub-exponential time (in the number of variables) is allowed for computation 
of the minimum number of clauses and literals of the function, it is not possible to obtain constant factor 
approximations of such values without refuting the new hypothesis. 

More specifically, for a pure Horn Boolean function h on n variables, we show that unless P = NP, the 
minimum number of clauses in a CNF or 3-CNF representation of h and the minimum number of literals 
in a 3-CNF representation of h cannot be approximated in polynomial time (in n) to within factors of 
n > even when the inputs are restricted to CNFs and 3-CNFs with 0(n 1+e ) clauses, for some small 
e > 0; it is worth mentioning that o(l) « (loglogn)~ c for some constant c 6 (0, 1/2) in this case. After that, 
we show that unless the Exponential Time Hypothesis (ETH) [19] is false, it is not possible to approximate 
the minimum number of clauses and the minimum number of literals in time exp(n 5 ) for some S G (0, 1), to 
within factors of 0(log^ n) for some /3 £ (0, 1), even when the inputs are restricted to 3-CNFs with 0(n 1+e ) 
clauses, for some small constant e > 0. For these same problems, we also obtain a hardness of approximation 
factor of O(logn) under slightly more stringent time constraints. It is worth mentioning here that we leave 
open the problem of determining hardness of approximation factors for exp(o(n)) time. 

The remaining of the text is organized as follows. We introduce some basic concepts about pure Horn 
functions in Section [2] and present the problem on which our reduction is based in Section [3] The reduction 
to pure Horn CNFs, its proof of correctness, and the polynomial time hardness of approximation result are 
shown in Section [4] In Section [5j we extend that result to pure Horn 3-CNF formulae and address the case of 
minimizing the number of literals. In Section [5] we show that sub-exponential time availability gives better 
but still super-constant hardness of approximation factors. We then offer some conclusions in Section [7] A 
conference version of this article appeared in Boros and Gruber [9] . 

2 Preliminaries 

In this section we succinctly define the main concepts and notations we shall use later on. For a more 
thorough exposition consult the book on Boolean Functions by Crama and Hammer 10J. 

A mapping h : {0, 1}" — > {0, 1} is called a Boolean function on n prepositional variables. Its set of 
variables is denoted by Vh '■= {i>i, ■ • ■ , v n }. A literal is a propositional variable as i>j (positive literal) or its 
negation Vi (negative literal). 

An elementary disjunction of literals 



with /, J C Vh is a clause if ID J = 0; the set of variables it depends upon is Vars(C) := JU J and its degree 
is given by deg(C) := \I U J|. It is customary to identify a clause C with its set of literals. 

1 Our reduction is actually approximation-preserving but we do not rely on such characteristic; see Vazirani '2<i or Williamson 
and Shmoys 1271 for appropriate definitions. 
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Definition 1. A clause C as in (QP is called pure Horn if \ J\ = 1. For a pure Horn clause C, the positive 
literal v £ J is called its head and S — C\J is called its subgoal. To simplify notation, we sometimes write 
C simply as S V v or as the implication S — > v. 

Definition 2. A conjunction $ of pure Horn clauses is a pure Horn formula in Conjunctive Normal Form 
(for short, pure Horn CNF). In case every clause in <E> has degree at most three, the CNF is a 3-CNF. A 
Boolean function h is called pure Horn if there is a pure Horn CNF formula $ = h, that is, if <5>(v) = h(v) 
for all v £ {0, 1}". 

Let $ = ViLi C« be a pure Horn CNF representing a pure Horn function h. We denote by 

n 

|$| c :=n and := ^ deg(C) 

i=i 

the numbers of clauses and literals of respectively. We say that $ is a clause (literal) minimum represen- 
tation of h if |$| c < \^f\ c (|$|; < \^>\i) for every other pure Horn CNF representation \1/ of h. With this in 
mind, we define 

r(h) := min{|$| c : $ is a pure Horn CNF representing h}, and 
X(h) := min{|$|; : $ is a pure Horn CNF representing h}. 

The clause (literal) pure Horn CNF minimization problem consists, as expected, of determining r(h) 
(X(h)) when h is given as a pure Horn CNF. Similar definitions hold for the 3-CNF case. 

A clause C as in (JTJ) is an implicate of a Boolean function h if for all v £ {0, 1}™ it holds that h(v) = 1 
implies C(v) = 1. An implicate is prime if it is inclusion- wise minimal w.r.t. its set of literals. The set of 
prime implicates of h is denoted by l p (h). It is known (cf. Hammer and Kogan iIT]) that prime implicates 
of pure Horn functions are pure Horn clauses. A pure Horn CNF $ representing h is prime if its clauses 
are prime and is irredundant if the CNF obtained after removing any of its clauses does not represent h 
anymore. It is also known that every clause (literal) minimum representation of h is prime and irredundant. 

Let C\ and C 2 be two clauses and v be a variable such that v £ C\ , v £ C2 , and C\ and C2 have no other 
complemented literals. The resolvent of C\ and Ci is the clause 

R(C U C 2 ) := (d \ {v}) U (C* 2 \ {v}) 

and Ci and C2 are said to be resolvable. It is known (e.g. Crama and Hammer [TU]) that if C\ and C2 are 
implicates of a Boolean function h, then so is R(Ci, C2). Naturally, a specialized result holds for pure Horn 
clauses. 

A set of clauses C is closed under resolution if for all C%, Ci £ C, R{C\, C2) £ C. The resolution closure of 
C, R(C), is the smallest set X D C closed under resolution. For a Boolean function h, let 1(h) := R(I p (h)). 

The following definitions and lemma concerning exclusive sets of clauses are useful when decomposing 
and studying structural properties of Boolean functions. 

Definition 3 (Boros et al. [5]). Let h be a Boolean function and X C 1(h) be a set of clauses. X is an 
exclusive set of clauses of h if for all resolvable clauses C\,Ci £ 1(h) it holds that: R(C\,C2) £ X implies 
Ci £ X and C* 2 £ X. 

Definition 4 (Boros et al. [8]). Let X C 1(h) be an exclusive set of clauses for a Boolean function h and 
let C £ 1(h) be such that C = h. The Boolean function h x — C Pi X is called the X-component of h. 

Lemma 5 (Boros et al. [5]). Let €1,62 Q 1(h), C\ ^ C2 such that C\ = C2 = h and let X £ 1(h) be an 
exclusive set of clauses. Then C\ PI X = C2 H X and in particular (C\ \ X) U (C2 H X) also represents h. 

The following procedure is useful in verifying if a given clause is an implicate of a given pure Horn function, 
among other things. Let $ be a pure Horn CNF representing a pure Horn function h and let Q £ Vh- The 
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forward chaining of Q on <&, F$(Q), is defined by the following algorithm. Initially, F$,(Q) — Q. As long as 
there is a pure Horn clause S V v in $ such that 5 C F$(Q) and u ^ F$(Q), add u to F$(Q). Two distinct 
pure Horn CNFs $ and represent the same pure Horn function h if and only if F&(U) = Fy(U) for all 
U C V/, (cf. Hammer and Kogan [IB])- This implies we can do forward chaining on ft, denoted by Fh(U), 
using any of its representations. 

Also, as stated by the next lemma, the forward chaining procedure provides us with a way of identifying 
exclusive families for pure Horn functions. Since at the time this article was being written such a lemma 
was only available in an unpublished manuscript by Boros, Cepek, and Kucera entitled "Finding a shortest 
representation of a pure Horn 3-CNF is hard" , we decided to include its proof below for completeness. 

Lemma 6. Let $ be a pure Horn CNF representing the function h, let W C ~V h be such that F$(W) = W, 
and define X(W) — {C G 1(h) : Vars(C) C W}. Then X(W) is an exclusive family for h. 

Proof. Let W be as specified in the lemma's statement and suppose there are clauses Gy^Gi G 1(h) such 
that R(C 1 ,C 2 ) G X(W) but {C*i, C 2 } % X(W). By the definitions of X(W) and of resolution, all but one of 
the variables in C = C\ U Ci must belong to W and that variable, say v G C\W, is precisely the variable 
upon which C\ and G 2 are resolvable, that is, v occurs as head in one of those clauses. Now, since for the 
same input the forward chaining outcome is independent of the pure Horn representation of h, it follows 
that v G F&(W) ^ W, a contradiction. □ 

3 The Label-Cover Problem 

The Label- Cover problem is a graph labeling promise problem formally introduced in Arora et al. [T] as 
a combinatorial abstraction of interactive proof systems (two-prover one-round in Feige et al. [H] and Feige 
and Lovasz [T3], and probabilistically checkable in Arora and Motwani [3] and Arora and Safra [4 ). It comes 
in maximization and minimization flavors (linked by a 'weak duality' relation) and it is probably the most 
popular starting point for hardness of approximation reductions. In this section, we introduce a minimization 
version that is best suited for our polynomial time results. Later, when dealing with sub-exponential time 
results in Section [6] we briefly mention its maximization counterpart. 

Definition 7. A Label-Cover instance is a quadruple C = (G, L Q , L' Q , LT Q ), where G — (X,Y,E) is a 
bipartite graph, L Q and L' are disjoint sets of labels for the vertices in X and Y respectively, and H = 
(He)eeE * s a se t °f constraints with each II® C L x L' being a non-empty relation of admissible pairs of 
labels for the edge e. The size of C is a function of the number of vertices in Y . 

Definition 8. A labeling for C is any function f Q : X — > 2 L °, Y — > 2 L ° \ {0} assigning subsets of labels to 
vertices. A labeling f covers an edge (x,y) if for every label £' G fo(y) there is a label £ G fo(x) such that 
(^o>^o) £ y y A total-cover for C is a labeling that covers every edge in E. C Q is said to be feasible if it 
admits a total- cover. 

It is well known (cf. Arora and Lund [5]) that non- feasible Label-Cover instances can be easily 
transformed into feasible ones preserving the same structural relations. Henceforth, we assume that every 
Label- Cover instance is feasible. 

Definition 9. For a total-cover f of C , let f (Z) :— X^zez l/o( z )l w ^h Z C XL)Y . The cost of f is given 
by k(/o) := f (X)/\X\ and f is said to be optimal if k(/ ) is minimum among the costs of all total-covers 
for C . 

It follows that I < k(/ ) < \L \ for any total-cover f and without loss of generality, we can assume G 
has no isolated vertices (for they do not influence the cost of any labeling). 

Definition 10. A total-cover f is tight if f (Y) — \Y\, i.e., if for every y €Y it holds that \ f Q (y)\ = 1. 
Lemma 11. Every LABEL-COVER instance C admits a tight optimal total-cover f :X—$ 2 L °,Y — > L' Q . 
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Proof. Suppose / as in Definition [5] is a minimally non-tight optimal total-cover for C . Hence, there is 
a y G Y such that \f (y)\ > 1. Let £' G fo(y) an d define a new labeling g where g(z) = f (z) for all 
z e X LI (Y\ {y}) and g{y) = f (y) \ {£' }. Note that g(y) ^ and that every edge (x,y) for x G N(y) := 
{z G X : (z,y) G E} is covered (for / is a total-cover). Moreover, clearly fc(/ ) = «(<?)• Hence, g is an 
optimal total-cover for £ in which g(T) < f (Y), contradicting the assumption made. The result thus 
follows. □ 

Notation 12. For £ being a Label-Cover instance as in Definition^ define r :— \X\, s :— \Y\, m := \E\, 
A := \L \, A' := \L' \, n e := |n°| for e G E, and set it := J2 eeE n e - 

Problem 13. For any p > 0, a Label-Cover instance C has promise p if it falls in one of two cases: 
either there is a tight optimal total- cover for C of cost 1, or every tight optimal total- cover for C Q has cost 
at least p. The Label-Cover p problem is a promise problem which receives a Label-Cover instance with 
promise p as input and correctly classify it in one of those two cases. 

Notice the behavior of Label- CoVER p is left unspecified for non-promise instances and so any answer is 
acceptable in that case. Due to this behavior, the Label-Cover p problem is also referred to as a gap-problem 
with gap p in the literature. The following result holds for it. 

Theorem 14 (Dinur and Safra |12])- Let c be a constant in (0,1/2) and p c {s) := 2( logs ) 1 c( ' with 
5 c (s) := (loglogs) c . There are Label-Cover instances £ with promise p c (s) such that it is NP-hard to 
distinguish between the cases in which k(C ) is equal to 1 or at least p c (s), where n(C ) is the cost of a tight 
optimal total-cover for C Q . 

Remark 15. Every Label-Cover instance produced by Dinur and Safra's reduction is feasible, has promise 
p c (s), and the following relations hold: r = s[S c (s)\, A = Q(p c (s)), A' = 0(p c (s) 5c ( s )) = o(s), s[5 c (s)\ < 
m < s 2 [6 c (s)\, and n < mW — 0(s 2 5 c (s)p c (s) s,: ^ +1 ) = o(s 3 ), for s and c as specified in Notation\T^ and 
Theorem \H\ respectively. 

Without any loss of generality, it is possible to modify the Label-Cover definitions so that the vertices 
in X and Y have their own copies of the label-sets L and L' , respectively. 

Definition 16. Let C = (G, L , Lq, n o ) be a feasible Label-Cover instance and consider the sets L x := 
{(x,£ ) : £ G L } for each vertex x G X, and L' y :— {(y,£' ) : £' G L' } for each vertex y GY. Also, define 
the sets L := U xt =xL x , V := Uy^yL'y, and II := \J^y^E^-{x,y)i with 

U {Xjy) := { ((x, £ ), (y, i' )) : (l , £' ) G U { ^ y) } . 

The quadruple C — (G, L, L' ,11) is an extension of C . 

It is clear that \L X \ = A for each vertex x G X, \L y \ = A' for each vertex y G Y, \U e \ = ir e for each edge 
e G E, | IT | = 7r, and that a labeling for £ is a mapping / such that x H> f(x) C L x for each vertex x G X, 
and y i-4 f(y) C L' f(y) ^ for each vertex y G Y. Furthermore, the remaining definitions and concepts 
extend in a straight forward fashion. 

Lemma 17. There is a one-to-one cost preserving correspondence between solutions to the Label-Cover p 
problem and to the Extended Label-Cover p problem, for any p > and the sizes of both instances are 
linearly related. 

Proof. It is easy to see that f is a (tight) total-cover for the Label- Cover p problem if and only if / is a 
(tight) total-cover for the Extended Label-Cover p problem, where f(x) — {(x,£ ) : £ G /o(^)} for every 
x G X, and f(y) = {(y,£' ) G L' y : £' G f Q {y)} for every y G Y (or f(y) = {y,f {y)) if the total-covers are 
tight). Furthermore it is clear that k(/ ) = «;(/), in any case. The linearity relation on the instances' sizes 
follows from Definition [TBJ □ 
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4 Reduction to pure Horn CNFs and a Polynomial Time Hardness 
Result 



Let C — (G — (X,Y, E), L, L' ,H) be a Label-Cover extension, and d and t be positive integers to be 
specified later. For nonnegative integers n, we define [n] as the set {1, . . . , n}. 

Associate prepositional variables u(£) with every label I G L U L' , e(x,y,i) and e(x, y,£',i) with every 
edge (x,y) G E, every label I' 6 and every index i e [d]. Let v(j) for indices j G [t] be extra variables 
and consider the following families of clauses: 



(a) u{£)Au(e') — >e{x,y,£',i) 

(b) f\ e(z, y, i) — > e(x, y, i) 

z£N(y) 

(c) e(x,y,i) — >e(x,y,f,i) 



(d) f\ f\ e(x,y,i) 

ie[d] {x,y)£E 

(e) —>«(*) 



u(£) 



V(x,y)€E, {£,£') en M , ie[d] 
V{x,y)eE, £'eL' y , ie[d] 
V(x,y)eE, £'eL' y , ie[d] 

V i G L U V 

V j 6 [*], feLui', 



where as before iV(y) := {x £ I : (x, G i?} for each y G Y . 

Let us call iff and $ the canonical pure Horn CNF formulae defined, respectively, by the families of 
clauses (a) through (d) and by all the families of clauses above, and let g and h, in that order, be the pure 
Horn functions they represent. 

Lemma 18. For d and t as above and r, s, m, A, A', and tt as in Notation ] 121 it holds that the number of 
clauses and variables in $ are |$| c = (t + l)(r\ + s\') + d{n + 2toA') and \Q\ V = t + dm(X' + 1) + rX + sX' , 
respectively. 

Proof. For #(a) denoting the number of clauses of type (a) in it is not difficult to see that the equalities 

#(a) = dir #(c) = dmX' #(e) = t(rX + sX') 

#(b) = dmX' #(d) = rX + sX' 

hold. For the number of variables, just notice there are rX + sX' variables u{£) 1 dm variables e(x, y, i), dmX' 
variables e(x,y,£',i), and t variables v(j). □ 

The driving idea behind our reduction is that of tying the cost of tight optimal solutions of C to the 
size of clause minimum pure Horn CNF representations of h. Once this is done, we achieve the desired 
dichotomy using a gap amplification device (the parameter t, on which clauses of type (e) depend upon). In 
more details, the families of clauses (a) through (d) form the core of our gadget as they locally reproduce the 
Label-Cover structural properties: clauses of type (a) correspond to the constraints on the pairs of labels 
that can be assigned to each edge; clauses of type (b) represent the condition for an edge to be considered 
covered; clauses of type (c) state that if an edge can be covered in a certain way, then it can be covered in all 
legal ways (implying it is not necessary to keep track of more than one covering possibility for each edge in 
clause minimum representations); clauses of type (d) translate the total-cover requirement and reintroduces 
all the labels available ensuring if a total-cover is achieved, so are all the others (and this is the basis of our 
gadget's global behavior). The second part of our gadget is composed by the family of clauses (e), which 
locally plays the role of introducing an initial collection of labels. On a global level, as long as this initial 
collection of labels forms a total-cover, the remaining clauses of type (e) can be removed from the CNF 
representation of h without any loss (for the clauses of type (d) reintroduces all the labels). However, at 
first, there is no guarantee a subset of the clauses of type (e) belongs to a clause minimum pure Horn CNF 
representation of h as it could be advantageous having some prime implicates with v(j), for j G [t], on their 
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subgoals and variables other than u(£) as heads, for I G L U V . It could even be the case only a constant 
times t number of such prime implicates are needed in a clause minimum pure Horn CNF representation h, 
rendering the gap amplification device innocuous. To avoid this undesirable scenario, we introduce another 
amplification device (the parameter d, on which clauses (a)-(d) depend upon) that helps to shape the prime 
implicates involving the variables v(j) into an appropriate form. 

Let V g be the set of variables occurring in VP. By definition, these are the variables the function g depends 
on. Clearly, V g is closed under Forward Chaining on $ as no clause in has head outside V g and no clause 
in $ \ '3/ has subgoals contained in V g . Hence, by the fact that $ represents h and by Lemma [SJ it follows 
that X(V g ), as defined there, is an exclusive family for h. It then follows. 

Corollary 19. The function g is an exclusive component of the function h. Consequently, g can be minimized 
separately. 

With some effort, it is possible to prove that 'I' is a clause minimum pure Horn CNF representation of g. 
For our proofs however, a weaker result suffices. 

Lemma 20. Let be a clause minimum pure Horn CNF representation of g. We have \^\ c / (X + A') < 

e| c < |*| c . 

Proof. The upper bound is obvious. For the lower bound, observe that each variable of g appears no more 
than A + A' times as a head. As in any clause minimum representation of g they must appear as head at 
least once, the claim follows. □ 

Lemma 21. For all j G [t], it holds that F h ({v(j)}) =V g U {v(j)}. 

Proof. It is enough to show that {e(x, y,i) : (x, y) G E,i G [d]} C F<s>({v(j)}), for a fixed j G [t]. The 
inclusion is false if there is a label £" G L U V such that u{£") £ F<j,({v(j)}). As for every label £ G L U L' , 
v(j) — > u(£) is a clause in $, the inclusion holds and the claim follows. □ 

Lemma 22. A variable v(j), for some index j G [t], is never the head of an implicate ofh. Moreover, every 
prime implicate of h involving v(j) is quadratic. 

Proof. The first claim is straight forward as all implicates of h can be derived from $ by resolution, and v(j) 
is not the head of any clause of By Lemma I^Tl v(j) — > z is an implicate of h for all z G V g . Since h is a 
pure Horn function, the claim follows. □ 

Lemma 23. For d > rX + sX' , the prime implicates involving variables v(j), for indices j G [t], in any 
clause minimum pure Horn CNF representation of h have the form v(j) — > u(£) with £ £ L U L' . 

Proof. By Lemma l22l there are only three types of prime implicates involving the variables v(j), with j G [t]. 
Let T = 9 A T be a clause minimum pure Horn CNF representation of h, with being a clause minimum 
pure Horn CNF representation of g. Let j G [t] and for i G [d], consider the sets 

ri:=Tn{v(j)^u(£):£eLUL'}, 

rj := r n {v(j) — ► e(x, y, *), v(j) — > e(x, y, £', i) : (x, y) G E, £' G L' y }. 

Suppose J2j£[t] ie[d] \ > ® f° r otherwise, there is nothing to show. Fix some j G [t], let i\, 12 G [d], i\ ^ 
and adjust the notation such that |I^ | < |I^ |. If the equality does not hold, as the conclusions drawn in h 
for each i G [d] must be the same it follows that T' = (T \ T^) U ^ is also a representation of h, where 

A n^ 2 : =i v U) — > e(x,y,i 2 ) : v(j) — > e(x,y,h) G TjJ 

U{v(j) —> e (x,y,e,i 2 ) :v(j) e(x,y,£' ,n) gT{}. 

Since |T'| < |T|, we get a contradiction with the clause minimality of T. Thus, = 7 j > for all i G [d]. 
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This implies J2 t e[d] \ r l I = d lo ^ d - K d> r\ + s\' = \LU L'\, we have that the pure Horn CNF 

A j := (T \ U ied Tl) U {v(j) — ► u(£) : £ e LU L'} 

has less clauses than T. As Aj represents ft, as well, this contradicts the clause minimality of T and the 
result follows. □ 

Definition 24. Let T be a clause minimum pure Horn CNF representation of ft. For each j G [t], consider 
the set Sj ; = {£ G L U V : v(j) — > u(€) G T} and define the function fj : X — > L,Y — > L' given by fj(x) = 
Sj [~l L x for vertices x G A" and /j(?/) = Sj H L^, /or vertices y EY . 

Lemma 25. for all indices j G [t] and vertices y EY, it holds that \fj{y)\ = 1. 

Proof. Assume the opposite and for some j G [t] and vertex y E Y such that |/j(y)| > 1, let £' E Sj(y). Let 
T' = T \ — !> u(£')} (cf. Lemma [23]) . If i*Y' ({v(j)}) = V g U {i>(j)} then T is not clause minimum and 
we are done. Therefore, consider that is not the case and fix an index i G [d] arbitrarily. 

As u{£') Fy({v(j)}), there must be an edge (x,y) G E such that e(x,y,i) £ Fy({v(j)}) as well 
and so, for every label I" G L' y there is a vertex z G N(y) for which e(z,y,£" ,i) ^ Eyi({v(j)}). That 
is the case when the clauses u(£) A u(£") — > e(z,y,£" ,i) are not satisfied for every label £ G L z with 
{1,1") G TI( Z , V ). Now, since u(0 G F T / ({v(i)» for all labels f G fj(y)\{£'}, it holds that u{£) (£ F r ,({v(j)}), 
for all labels £ <E L z such that ) G H {z ^ y) and G f 3 (y) \ {£'}. This gives V = V \ A, with 
A = {v(j) — > u(£") : £" G (fj(y) \ {£'})}, and implies T = T \ A, contradicting the clause minimality of T. 
The claim thus follows. □ 

Lemma 26. For each index j G [t], the function fj is a tight total- cover for C. 

Proof. Fix an index j G [t] . By construction, fj is a labeling for C and Lemma [25] implies fj is tight 
independently of being a total-cover or not. Suppose fj is not a total-cover for C. So, there exists an edge 
(x, y) £ E and a label £' G fj(y) such that for all labels I G fj(x) it holds that (^, £') II^). As T is clause 
minimum, no variable e(x,y,£',i) belongs to Fx({v(j)}) for every index i G [d]. That however, contradicts 
the fact that T represents ft (cf. Lemma [2T]) . □ 

In order to relate the size of a clause minimum representation of h to the cost of an optimal solution to 
C, we need a comparison object. Let / be a tight total-cover for C and consider the following subfamily of 

C-1.0jL1.S6S " 

(e>) v(j)—>u(£) Vj€[t],x€X,y€Y,£ef(x)Uf(y), 

with f(x) C L x and f(y) C L' y . Let <!>/ be the refined canonical (w.r.t. /) pure Horn CNF formula resulting 
from the conjunction of ^ with the implications (e'). 

Lemma 27. represents ft. 

Proof. Suppose the opposite. As g is also an exclusive component of $/, that means u(£") G - F$ f ({v(j)}) 
for some label £ " G L U L' and index j G [i]. For that to happen, for any index i G [d] there must be 
an edge (x,y) G E such that e(x,y,i) ^ E$ f ({v(j)}) and for every £' G there is a vertex z G N(y) 
such that e(z,y,£',i) F$, ({f(j)}) as well. But since / is a tight total-cover, there is a pair of labels 
(£z,£' y ) G Tl(z,y) satisfying the clause u(£ z ) A u(£' y ) — > e(z,y,£' y ,i) as both u(£ z ) and u(£ y ) belong to 
F$ f ({v(j)}), contradicting the assumption. Therefore, F$ f {{v(j)}) — V g U {v(j)} and the claim follows (cf. 
Lemma |2T|). □ 

Lemma 28. _Fbr eacft index j G [i], /j is a ii^ftt minimum cost total-cover for C 

Proof. Fix an index j G [t]. Suppose the tight total-cover fj for C (cf. Lemma is not optimal, and let /* 
be a minimum cost one. Consider the refined canonical formula $y* and define T f* = ($y* \ fy) U 9, where 
C T represents g. Lemma |2~T1 and Corollary [TO1 guarantee Tj. represents ft as well. As |T| C = |0| c + |r| c , 
|T/.| C - |0|c+|*/*\*lc and |*/. \ *| c = t( K (f*)r + s) < t(/s(/j)r + «) = |F| C , it follows that |T/.| C < |T| C , 
a contradiction. Therefore, fj is a tight total-cover of minimum cost. □ 
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Remark 29. The tight optimal total-covers fj and fk, for j,k 6 [t] and j k, might be different. As they 
have the same optimal cost, any one of them can be exhibited as solution to C. 

Corollary 30. |W| C /(A + A') < |T| C — t(n(f)r + s) < \^\ c , where n(f)r + s is the total number of labels in 
a tight optimal total- cover f for C. 

We can now prove the main result of this section. 

Theorem 31. Unless P = NP, the minimum number of clauses of a pure Horn function on n variables 
cannot be approximated in polynomial time (in n) to within a factor of 



where 5 c (n) — (loglogn) c and c is any fixed constant in (0, 1/2), even when the input is restricted to CNFs 
with 0(n 1+2e ) clauses, for some e 6 (0, 1/4]. 

Proof. Let Co be one of the Label- Cover promise instances produced by Dinur and Safra's reduction (cf. 
Section [3]) and let C be its extension. 

Choose d = rX + sX' + 1 (cf. Lemma [23]). let $ be the canonical formula constructed from C, and let h be 
the pure Horn function it defines. Let T be a clause minimum pure Horn CNF representation of h obtained 
by some minimization algorithm when $ is given as input. 

Let c be fixed arbitrarily close to 1 /2 (for the larger the value of c, the larger the hardness of approximation 
factor). For convenience, let 8 = 8 c (s) and p — p c (s) (cf. Theorem [T1J) . Using Notation [T^] and the 
parametrization given by Remark 1151 Lemma [TBI and Corollary 1301 give 



Pc(n E ) > 2 e(los ") 



l-l/icC») 



l-o(l) 



n) 




> st(n(f)(8 - 1) + 1) + n(s 2 8p 5 ) 

> st(K{f)(8 -1) + 1)+ u(s 2 ), 



and 



|T| C < t(«(/)r + s) + d(ir + 2mA') + rA + sX' 

< st(K(f)S+l)+0(s 3 S 2 p 2S ) 

< St(K(f)S+l)+o(s 4 ). 



Moreover, 




«(£) = 1 =► |T| C < s( 1+1 ^8 c (s)(l + o(l)), 
k{C) > Pc (s) =► |T| C > s^+ l /^8 c {s){ Pc { s ) +o(l)). 



Letting n — \T\ V and relating |T| C to the number of variables of h, the above dichotomy reads as 



k(C) - 1 => |T| C < n( 1+e ><y c (n e )(l + o(l)), 
k(C) > Pc (s) => |T| C > n( 1+ ^S c (n s )( Pc (n') + o(l)), 
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giving a hardness of approximation factor of p c {n E ) for the pure Horn CNF minimization problem (cf. 
Theorem HU). 

To conclude the proof, notice that the gap 

p c (n e ) = 2 ( elo s«)" 1/M " E) > 2^ n ^ 1/5An) 

and that the number of clauses n^ 1+£ ^5 c (n e ) < n 1+2s . □ 

5 Pure Horn 3-CNFs and Minimizing the Number of Literals 

Clauses of type (b) and (d) might have arbitrarily long subgoals in the reduction we presented (cf. Section|4|). 
It is possible to strengthen our result by modifying the gadget so that every clause is either quadratic or 
cubic and proving that even in this case, the pure Horn minimization problem is still hard to approximate. 

Specifically, we replace the clauses of type (b) in a similar way to what is done on the reduction from 
SAT to 3-SAT (cf. Garey and Johnson [M|), that is, in a linked list fashion. For each vertex y £ Y, let 
d(y) := \N(y)\ be the its degree and let (2* z?, . . . , Zy') be an arbitrary but fixed ordering of its neighbors. 
Associate new propositional variables e($, x, y,£',i) with all edges (x,y) E E, all labels £' E L' and all 
indices j3 E [d(y) — 2] and i £ [d], where as before, d = rX + sX' + 1. Replace the clauses of type (b) by the 
clauses below: 

(b x ) f\ e(z,y,£',i)^e(x,y,i) V (x,y) £ E, £' E L' y , i E [d], d(y) < 2; 

z£N{y) 

(b 2 ) e(zl,y,£',i) Ae(z y ,y,£',i) — > e(l,x,y,£',i) V (x,y) E E, t> £ L' y , i e [d], d{y) > 3; 

(b 3 ) e(z£+ 2 , y, £', i) A e(/3, x, y, l\ i) — > e(/3 + 1, x, y, £', i) V (x, y) E E, £' E L^, i £ [d], £ [d(y) - 3]; 

(hi) e(z^\ y, i) A e(d(y) - 2, a,, y, i) — > e(x, y, i) V (1, y) £ E, £' £ L' y , i e [d], d(y) > 3. 

The same technique does not work with the clauses of type (d), though. The reason being it would 
introduce clauses of the form u>i A W2 — > u(£), for £ £ L U L' and propositional variables w% and W2> 
implying that it could be advantageous to have 2t' prime implicates of the form v(j) — > w il with i E [2] 
and j E [t'] (where t' is a positive integer to be fixed in the same way as t was), in a clause minimum 
representation of hi (the pure Horn function after introducing the new variables), thus rendering the gap 
amplification device innocuous once again. We circumvent such possibility replacing the clauses of type (d) 
as follows. Associate new propositional variables e(k, i) with all indices k E [m — 1] and i E [d], where 
m = \E\ as before. Let (ei, e2, . . . , e m ) be an arbitrary but fixed ordering of the edges in E. Let e(k, i) with 
indices k £ {m, m + 1, . . . , 2m — 1} and i E [d] be aliases for the original propositional variables e(x, y, i) 
in such a way that for each edge {x, y) E B, {x,y) — ek- m +i is associated to e{k, 1). Also fix an arbitrary 
ordering for the labels in LU L'. Create a complete binary tree for each index i £ [d] through the clauses 



(di) e(2k,i) A e(2k + — > e(k,i) VfcE 





2m - 1 






2 





i E [d]; 



(2m — 1 \ 

(d 2 ) e(2m — l,i) — > e ( ,i\ V i £ [d] , when 2m — 1 is even; 

(ds) e(l,i)Ae(l,i + l)->u(4) ViE[d-l], 

in which the variables e(x, y, i), for edges (x, y) E E, are the leaves. The clauses of type (da) associates each 
label from LUL' with the roots of two aforementioned trees in an orderly fashion. It helps to think about 
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the roots as nodes and the labels as links of a path of size d. In order for the path to be well defined (all 
labels being reintroduced), all nodes must be present (every variable e(l,i), i G [d], must be reached). This 
implies that if for some index j G [t'\ a pure Horn 3- CNF formula representing h' has a prime implicate of 
the form v(j) — > e(k, i), for some k G [2m — 1] and i £ [d], this prime implicate is either superfluous or it is 
accompanied by at least d — 1 other prime implicates with head v(j) and having the same form. As d works 
as a second amplification device (in a similar way to the CNF case), it is clearly not advantageous for clause 
minimum pure Horn 3-CNF formulae representing h' to have prime implicates involving v(j) of forms other 
than v(j) — > u(£), with £ G L U V . Therefore, this construction suits our purposes. 

In a similar fashion to what was done before, let ty' and $' denote the canonical pure Horn 3-CNF 
formulae defined by the implications (a) through (d.3) and by all the implications above, respectively, and 
let g' and h! be, in that order, the pure Horn functions they represent. 

Lemma 32. For $' as described, it holds that d(ir + 2mX' + 2m - 1) - 1 < \&\ c -t(r\ + s\') < d(n + m 2 X' + 
Am — 2) and that t < \&\ v < t + dm(X' + 2) + m 2 X' , with d and t' as above and r, s, m, X, X' , and it as in 
Notation[TM 

Proof. Let #(b) := Yliew and #(d) := J2ie[s] #(di), where #(a) denotes the number of clauses of 

type (a) in 

For each edge (x, y) G E, there was a clause of type (b) in $ whose subgoal had size equal to d(y). Each 
such clause was either maintained in case d(y) < 2 (new clauses bi) or replaced by d(y) — 1 new clauses (b2, 
b3, and b4) in a linked-list fashion in giving that 

dmX' < #(b) = dX' (d{y) - 1) < dm(m - 1)A'. 

(x,y)eE 

The clauses of type (d) in $ were replaced by clauses of type dj., d.2, and d3 in For the clauses (di) and 
(d2), notice that there are d binary trees and each one has 2m — 1 nodes and height r\ = 1 + |log(2m — 1)J = 
|log(4m — 2)J. Also, there are d— 1 clauses of type (da), implying that 

7,-1 

d{2m - 1) - 1 < #(d) =d-l + dJ2 2L = d - 1 + d ( 2 '' ~ 1) < d(4m - 2). 

1=1 

The new gadget introduces d(m — l) new variables of e(fc, i) kind (notice the aliases are not new variables) 
and at most dm(m — 1)A' new variables of e(/3, x, y, £', i) kind. Now, using the estimates of Lemma [TBI the 
result follows. □ 

The same arguments used in Corollary [19] apply in this new setting as the differences between the two 
gadgets are local. Indeed, it is the case that Lemmas |2"0H22"1 Lemmas I25H281 and Corollary [30] also transfer 
in a semi-verbatim fashion: sometimes, small changes on the objects and arguments used are needed, but 
the overall ideas and the structure of the proofs are maintained. A small note on the proof of the analogue 
of Lemma [23] is in order. With the introduction of the new variables in the 3-CNF construction, the sets 
for j G [t'] and i G [d] need to be accordingly redefined to include every possible prime implicate involving 
variables v{j) not having the form of those in T J U , that is, not having the form v(j) — > u(£) for £ G L U L'. 
The remaining of the proof then transfer in a verbatim fashion. For clarity reasons (and also to illustrate 
such adaptations), we present its proof below. 

Lemma 33 (Analogue of Lemma [53]). For d = rX + sX' + 1, the prime implicates involving variables v(j), 
for indices j G [t], in any clause minimum pure Horn CNF representation of h' have the form v(J) — > u{£) 
with £ G L U L'. 

Proof. By Lemma [22] there are only three types of prime implicates involving the variables v(j), with j G [t]. 
Let T' = 0'Ar' be a clause minimum pure Horn 3-CNF representation of h! , with 0' being a clause minimum 



11 



pure Horn 3-CNF representation of g' . Let j £ [t] and for i £ [d], define 

w(i) := {e(k,i),e(x,y,£',i),e(P,x,y,£',i) : k £ [2m - 1], (x,y) £ E L^, 
/3e[%)-2]} 

and consider the sets 

r{ := r' n {u(j) — >«(<):^lul'}, 
r| : = r' n {v{j) — > z ■. z e «;(«)}. 

Suppose X)je[t] ie[d] ^ > *~* ^ or otherwise, there is nothing to show. Fix some j € [t], let ii, i 2 £ [d], i\ ^ i 2 , 
and adjust the notation such that |r^J < |. If the equality does not hold, as the conclusions drawn in 
b! for each i £ [d] must be the same it follows that T" = (T' \ F^ 2 ) U _>. i2 is also a representation of ft/, 
where 

: =M?) — > e (M 2 ) : w(j) — > e (Mi) e r U 

U Hi) — ►efoy,^) : — >e(x,y,f,ii) eT{j 

U {«(?) ^e(l3,x,y,t',i 2 ) : v(j) ^ e(/3,x,y,£' ,h) eV{}. 

Since |T"| < |T'|, we get a contradiction with the clause minimality of T'. Thus, L^| = 7 j > for all « £ [d]. 
This implies Si G [d] \ T i I = ^ d - ^d>r\ + sX' = \LU L'\, we have that the pure Horn 3-CNF 

Aj := (T' \ U ted rj) U {v(j) — + u(*) :<£lU L'} 

has less clauses than T'. As Aj represents h! as well, this contradicts the clause minimality of T' and the 
result follows. □ 

Notice the proof above is essentially the same as the proof of Lemma [23] The only difference, as already 
mentioned, resides on the definitions of the sets T\, for indices j £ [t] and i 6 [d]. This tells us that: (i) the 
new variables e(l,i) play a similar role to the one fulfilled before by the variables e(x,y,i), for i £ [d] and 
(x, y) £ E as usual; (ii) the remaining new variables e(k, i) with k £ [m — 1] \ {1}, despite being structurally 
necessary, have no real influence on the outcome; and (iii) the new variables e(/3, x, y, £', i) have the same 
status as the original variables e(x,y,£',i), for edges (x, y) £ E, labels £' £ L', and indices i £ [d] and 
/?£[%) -2]. 

We are then able to prove the following. 

Theorem 34. Unless P = NP, the minimum number of clauses of a pure Horn function on n variables 
cannot be approximated in polynomial time (in n) to within a factor of 

Pc(n E ) > 2 £ ( 1 °s") 1 ~ 1/5c( " ) = 2°^° (1)n \ 

where 8 c {n) — (loglogn) c and c is any fixed constant in (0, 1/2), even when the input is restricted to 3-CNFs 
with 0(n 1+2e ) clauses, for some e £ (0, 1/6]. 

Now notice that with exception of the variables v(j), with j £ [t'], that only appear as subgoals in 
quadratic prime implicates, every other variable appears in subgoals and heads of mostly cubic pure Horn 
clauses. Also, since the functions we are dealing with are pure Horn, we have no unit clauses. Therefore, it 
holds that 2|$'| c < |$'|; < 3|$'| c or in other words, that = 9(|$'| c ), where $' is a pure Horn 3-CNF 
formula obtained from our 3-CNF construction above. We then have the following result. 

Corollary 35. Unless P = NP, the minimum number of literals of a pure Horn function on n variables 
cannot be approximated in polynomial time (inn) to within a factor o/2°( log ( n \ even when the input 
is restricted to 3-CNFs with 0(n 1+£ ) clauses, for some small constant e > 0. 
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6 Sub-exponential Time Hardness Results 



The hardness of approximation results shown in Sections [4] and [5] concern approximations obtained in poly- 
nomial time on the number of variables of the pure Horn functions in question. In this section, we show that 
under another well known complexity hypothesis, allowing sub-exponential time in the number of variables 
is still not enough to obtain a constant factor approximations. 

The following conjecture concerning the time solvability of the fc-SAT problem (i.e., the problem of 
determining if a fc-CNF has a satisfying solution) was introduced in Impagliazzo and Paturi [19] ■ 

Conjecture 36. For k > 3, define Sk to be the infimum of 



with n being the number of variables of the k-S AT instance. The Exponential Time Hypothesis (ETH) states 
that Sk > for k > 3. 

In other words, if true, the ETH implies that there is no sub-exponential time algorithm for fc-SAT with 
k > 3 and hence, that P ^ NP (the converse however, does not hold). The ETH has many implications 
beyond search problems, for instance, in fields as communication complexity and structural complexity and 
is widely believed to be true. 

In a recent breakthrough, Moshkovitz and Raz [23] introduced a new two-query projection test PCP 
system with sub-constant error and quasi-linear size. One way of interpreting their result is the following. 
The maximization flavor of the Label-Cover £ promise problem receives a Label-Cover instance in which 
each vertex can receive at most one label and the goal is to decide whether there is a total-cover for the 
instance in question or if every labeling covers at most an e fraction of the edges. Let <j) be any 3-SAT instance 
of size (number of clauses) equal to a and let e = s(a) > l/(log /3 a), for some positive and sufficiently small 
/?. There is a reduction from <p to a Label-Cover instance of maximization flavor and promise e whose 
graph has size at most a 1+ °^ poly(l / e) and such that log \L\ < poly{\/e) and log \L'\ < 0(log(l/e)), where 
as before L and L' are the label sets. For e as above, this reduction is quasi-linear sized in a. It also gives 
that it is NP-hard to solve this maximization flavor of Label-Cover e . While the soundness error e is not 
as small as that of Dinur and Safra [12] . its quasi- linear size implies that assuming the ETH (i.e., 3-SAT 
takes exp(f2(cr)) time to solve), the maximization version of the Label-Cover e promise problem cannot be 
solved in less than exp(cr 1 ~ ( 1 ^) time, thus ruling out better approximations in sub-exponential time. 

A simpler proof was presented in Dinur and Harsha [11] who also observed that the reduction can be 
carried out with e taken as small as 2~°( log ^ for some /3 € (0, 1) at the cost of the label set L having size 
sub-exponential in a. Using the 'weak duality' between the maximization and minimization flavors, we can 
read their result as follows. 

Theorem 37 (Dinur and Harsha [11]). There exists constants c > and f3 £ (0,1), such that for every 
function 1 < p{a) < 2°( log a \ there exists alphabets L and L' of sizes 0{2 P ^ ) and 0(p(a) c ), respectively, 
and for these alphabets there are Label-Cover instances with promise p{o~) such that even under nearly 
linear length preserving reductions it is NP-hard to distinguish for any one of such instances between the 
case where it has a total-cover of cost 1 from the case where every total-cover for it has cost at least p{o~). 

Remark 38. We then have a new parametrization, slightly different from the one in Remark \15\ s = 
with o(l) « (loglogo-)"^ 1 ), r = a2 ^ ' a \ m = rp(a) c , A = 0(2^°); and A' = 0(p(a) c ). 

We shall deal with polynomial time reductions first. Let •& < f3 such that cd < 1 and take p(a) = log a. 
Then, A = O(2( logfl CT ) C ) = O(a) and A' = 0((log' ? a) c ) < O(logcr), and the size of the instance in question is 
nearly linear in a. Moreover, as m < tt < mW , it follows that 



6 : there exists an O (2 &n poly{n)) time algorithm for solving the k-S AT problem 
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and that d, one of the amplification devices we make use in our reductions (cf. Lemma [2"3"]l . is equal to 

d = rX + sX' + 1 = 6 ( 2°^ CT V + <7 1 +°( 1 ) log c ^ a) . (2) 



It is then possible to show hardness results for the sub-exponential time case in a similar way. 

Theorem 39. Assuming the ETH, the minimum numbers of clauses and literals of pure Horn functions on 
n variables cannot be approximated in exp(n s ) time, for some S € (0, 1), to within factors of 0(log 1? n), for 
some i9 € (0, 1), even when the input is restricted to 3-CNFs with 0(n 1+2e ) clauses, for some small constant 

£ > 0. 

Proof. Let Cq be one of the Dinur and Harsha's Label-Cover promise instances and let C be its extension. 
Let d be as in Equation @, $' be the canonical pure Horn 3-CNF formula constructed from C, and h' be the 
pure Horn function it defines. Let T' be a clause minimum pure Horn 3-CNF representation of hi obtained 
by some minimization algorithm when $' is given as input. 

For convenience, let p — p(a) and ( = ((a) = r/s = 2°( log ^/er^ 1 ). Using Notation [T2l and the new 
parametrization above, Lemma [32] and an analogue of Corollary [30] give 

m> W )r +a) + ^ +2mA ;+^- 1 )- 1 

> st( K (f)r/s + 1) + f}(<r 1+o(1) 2 logT a \og 2ci} a) 
>«t(«(/)C+l)+w(s), 

and 

|T'| C < t(n{f)r + s)+ d(n + m 2 \' + 4m - 2) 
< st(K{f)r/s + 1) + 0(CT 4 2 logT a log 3 a) 
<st(K(f)C+l)+o(s 5 ), 

where r > is some constant. Moreover, 

t < \T'\ V < t + dm(X' + 2) + to 2 A' 

< < + 0(cr 2 2 log2T,T log 2 cr) 

< < + o(s 3 ). 

Choosing e 1 > such that t = s 1 / 5 ' = f2(s 4 ), gives |T'| C — -> s( 1+1 / £ '>C(<t)(k(/) + o(1)) and |T'|„ — > s 1 /^' 
as s — > oo, implying the construction of $' can be performed in polynomial time in the number of variables 
of h'. 

It also gives the following dichotomy 

k{£) = 1 => |T'| C < S (1+1/e '»((a)(l + o(l)), 

> p(a) =► |T'| C > s^ +1 ^aa)(p(o-) + o(l)), 

with p(tr) = logger in this case. Let n = |T'|u and let e = e'/(l + o(l)). Relating the number of clauses of 
T' to the number of variables of h, the above dichotomy reads as 

«(£) = 1 =^ |T'| C < 7 1 ( 1+£ ')C(n £ )(l + o(l)), 
k{C) > p(a) => \T% > n( 1+£ ')C(n £ )(p(n e ) + o(l)), 

giving a hardness of approximation factor of p(n 6 ) for the pure Horn 3-CNF clause minimization problem 
(cf. Theorem [37]) . As e* 5 is a constant, it follows that the gap 

p(n £ ) = log'V = 0(log*n). 
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Also, the number of clauses n^ 1+e \{n e ) < n 1+2e . 

To conclude the clause minimization part, observe that sub-exponential time in a, namely, 2°^ is 
equivalent to 2°(™ 1/4 ° (1> ) time in n and since |T'| C = 0(n 1+2e ), the time bound follows for 8 < (1 — 2e')/4. 

Regarding literal minimization, the structure of $' implies its numbers of clauses and literals differ by 
only a constant (cf. Section [5]) and therefore, similar results hold in this case as well. □ 

A natural next step is trying to push the hardness of approximation factor further by allowing, for 
instance, super-polynomially sized Label-Cover instances. More specifically, let b > 1 be such that a — 
be > 1 and take p(a) = log" a = o{2°^ for some /3 6 (0,1). It then follows that A = 0(2 lo s° CT ) = 
0(cr lo s Q ~ 1,T ), \> = 0(log Q a), and that tt < O(2 lo s° °+°( l °s fi *) a i og « cr ) j implying the size of the Label- 
Cover instance is now super-polynomial in a. Also, it follows that the value we chose for d is 

rf = e( ( r2 lo s c " CT + ( lo ^ CT )) . 

Following the steps of Theorem [39js proof, we obtain that 

u(s 2 ) < |T'| C - t(K(f)r + s)<o (a^"' 1 ff ) 

and that 

t < |T'|„ < t + o(cr 3+log ^ 1 ' 7 ) . 

Now, choosing t = (8cr) log 8<T gives a hardness of approximation factor for pure Horn clause minimiza- 
tion equal to p{cr). Writing a as a function of n, we obtain that a = (2 logl " ™)/8 and hence, a hardness of 
approximation factor 

/ 9 v'log n \ Q 

p(n) = log" ( — ^— J = ( v^i^ - 3 J =0(logn). 

As |T'| C = 0(n 2 ), sub-exponential time 2°( a > means 

M (n) : . 2 <( 2 ^ /8 ) 1/2 ) =0 ( 2 «— 

time, for any (huge) constant C > 1. We then just proved the following theorem. 

Theorem 40. Assuming the ETH, the minimum numbers of clauses and literals of pure Horn functions on 
n variables cannot be approximated in fi(n) time to within factors of 0(\ogn) , for all a > 1, even when the 
input is restricted to 3-CNFs with 0(n 2 ) clauses. 

Now, if we take p = 2°^^ for some (3 G (0,1), we have that A = 2 2 ° °^ " } , A' = 2°< 1 °s fi<T ), m = 

^OCWM an d n = d = (T 2 ( 1 °g^)2 2 ° (log '\ that is, we obtain a sub-exponential sized Label-Cover 
instance and, consequently, a sub-exponential sized pure Horn 3-CNF. This implies that 

T'| c - i(«(/)r + s) / n2 o ( ia e 



and 



t < \T'\ V <t + o(a 2 2 2C 



Letting n :— \T'\ V — t — 2 2 +1 B , for some constant 7, it gives 



(7 = 2 (i log - F ) 



1//3 



and a hardness of approximation factor p(o~) — (logn)/2, i.e., a similar O(logn) hardness factor under far 
more stringent time constraints. Such result is rendered obsolete by Theorems [34l and l40l 
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7 Conclusions 



In the last three sections, we have shown improved hardness of approximation results for the Horn CNF 
formulae minimization (regarding the number of clauses and the number of literals). A natural question that 
comes to mind concerns the tightness of these results. It was shown in Hammer and Kogan P2] that for a 
pure Horn function on n variables, it is possible to approximate the minimum number of clauses and the 
minimum number of literals of a pure Horn CNF formula representing it to within a factor of n — 1 and (' 2 l ) , 
respectively. This leaves a considerable gap on the polynomial time case and an even larger gap on the sub- 
exponential time case (we are not aware of the existence of any sub-exponential time approximation algorithm 
for such problems) . Whether these gaps can be improved remain yet to be seen. Improvements on two-query 
sub-constant error PCP systems, leading to larger gaps on the Label-Cover problem, would immediately 
imply (as long as the instances are polynomially sized) larger hardness of approximation results for the 
problems we discussed. Such improvement might allow one to obtain hardness results for super-polynomial 
time and close the gap between exp(n <5 ) and exp(o(n)) sub-exponential times, for some 5 £ (0,1), both of 
which we left open. A second possibility is the use of a different promise problem (with a larger gap than 
that of Label-Cover) as a starting point and the construction of a new reduction. 

On the opposite direction, it might be possible to design new approximation algorithms with lower factors 
of approximation. 

We conjecture that it is possible to improve the hardness of approximation factor for clause minimization 
in polynomial time, of a Horn function in n variables to at least n e for some small e > (perhaps even 
n 1 / 2 ^ 6 ) and that a similar factor holds in the case of literal minimization (up to a constant). In favor of our 
conjectures, we mention the fact that Umans [25 showed that for general Boolean functions (i.e. non-Horn), 
the decision versions of clause and literal minimization problems are Ej-complete and it is also E^-hard to 
approximate such quantities to within factors of N £ for some e > 0, where N is the size of the input (a 
low-degree polynomial in the number of variables of the function in question). 

Acknowledgements. We would like to thank O. Cepek and P. Kucera for reading an earlier version of 
this article and providing some useful comments. 

References 

[1] Sanjeev Arora, Laszlo Babai, Jacques Stern, and Z. Sweedyk. The hardness of approximate optima in lattices, codes, and 
systems of linear equations. J. Comput. System Sci., 54(2, part 2):317-331, 1997. 34th Annual Symposium on Foundations 
of Computer Science (Palo Alto, CA, 1993). 

[2] Sanjeev Arora and Carsten Lund. Hardness of approximations, in Approximation Algorithms for NP-Hard Problems, D. 
Hochbaum, ed., PWS Publishing, Boston, 1996. 

[3] Sanjeev Arora, Carsten Lund, Rajeev Motwani, Madhu Sudan, and Mario Szcgedy. Proof verification and the hardness of 
approximation problems. J. ACM, 45(3):501-555, 1998. 

[4] Sanjeev Arora and Shmuel Safra. Probabilistic checking of proofs: a new characterization of NP. J. ACM, 45(1):70-122, 
1998. 

[5] Giorgio Ausiello, Alessandro D'Atri, and Domenico Sacca. Minimal representation of directed hypergraphs. SIAM J. 
Comput., 15(2):418-431, 1986. 

[6] Amitava Bhattacharya, Bhaskar DasGupta, Dhruv Mubayi, and Gyorgy Turan. On approximate Horn formula minimiza- 
tion. In Samson Abramsky, Cyril Gavoille, Claude Kirchner, Friedhelm Meyer auf der Heide, and Paul G. Spirakis, editors, 
ICALP (1), volume 6198 of Lecture Notes in Computer Science, pages 438-450. Springer, 2010. 

[7] Endre Boros and Ondfej Cepek. On the complexity of Horn minimization. RUTCOR Research Report RRR 1-94, Rutgers 
University, New Brunswick, NJ, USA, 1994. 

[8] Endre Boros, Ondfej Cepek, Alexander Kogan, and Petr Kucera. Exclusive and essential sets of implicates of Boolean 
functions. Discrete Appl. Math., 158(2):81-96, 2010. 

[9] Endre Boros and Aritanan Gruber. Hardness results for approximate pure Horn CNF formulae minimization. In Inter- 
national Symposium on Artificial Intelligence and Mathematics (ISAIM 2012), Fort Lauderdale, Florida, USA, January 
9-11, 2012. 

[10] Yves Crama and Peter L. Hammer, editors. Boolean Functions: Theory, Algorithms, and Applications, volume 142 of 
Encyclopedia of Mathematics and its Applications. Cambridge University Press, Cambridge, 2011. 



16 



[11] Irit Dinur and Prahladh Harsha. Composition of low-error 2-query PCPs using dccodable PCPs. In 2009 50th Annual 
IEEE Symposium on Foundations of Computer Science (FOCS 2009), pages 472-481. IEEE Computer Soc., Los Alamitos, 
CA, 2009. 

[12] Irit Dinur and Shmucl Safra. On the hardness of approximating label-cover. Inform. Process. Lett., 89(5):247-254, 2004. 

[13] William F. Dowling and Jean H. Gallier. Linear-time algorithms for testing the satisfiability of propositional Horn formulae. 
J. Log. Program., l(3):267-284, 1984. 

[14] Uriel Feige, Shafi Goldwasser, Laszlo Lovasz, Shmucl Safra, and Mario Szegedy. Interactive proofs and the hardness of 
approximating cliques. J. ACM, 43(2):268-292, 1996. 

[15] Uriel Feige and Laszlo Lovasz. Two-prover one- round proof systems: Their power and their problems (extended abstract). 
In STOC, pages 733-744. ACM, 1992. 

[16] Michael R. Carey and David S. Johnson. Computers and intractability. W. H. Freeman and Co., San Francisco, Calif., 
1979. A guide to the theory of NP-completeness, A Series of Books in the Mathematical Sciences. 

[17] Peter L. Hammer and Alexander Kogan. Horn functions and their DNFs. Inf. Process. Lett., 44(l):23-29, 1992. 

[18] Peter L. Hammer and Alexander Kogan. Optimal compression of propositional Horn knowledge bases: complexity and 
approximation. Artificial Intelligence, 64(1):131— 145, 1993. 

[19] Russell Impagliazzo and Ramamohan Paturi. On the complexity of fc-SAT. J. Comput. System Sci., 62(2):367-375, 2001. 
Special issue on the Fourteenth Annual IEEE Conference on Computational Complexity (Atlanta, GA, 1999). 

[20] Alon Itai and Johann A. Makowsky. Unification as a complexity measure for logic programming. J. Log. Program., 
4(2):105-117, 1987. 

[21] G. Kortsarz. On the hardness of approximating spanners. Algorithmica, 30(3):432— 450, 2001. Approximation algorithms 
for combinatorial optimization problems. 

[22] David Maicr. Minimum covers in relational database model. J. ACM, 27(4):664-674, 1980. 

[23] Michel Minoux. Ltur: A simplified linear-time unit resolution algorithm for Horn formulae and computer implementation. 
Inf. Process. Lett., 29(1):1-12, 1988. 

[24] Dana Moshkovitz and Ran Raz. Two-query PCP with subconstant error. J. ACM, 57(5):Art. 29, 29, 2010. 

[25] Christopher Umans. Hardness of approximating £?J minimization problems. In 40th Annual Symposium on Foundations 
of Computer Science (New York, 1999), pages 465-474. IEEE Computer Soc, Los Alamitos, CA, 1999. 

[26] Vijay V. Vazirani. Approximation algorithms. Springer, 2001. 

[27] David P. Williamson and David B. Shmoys. The design of approximation algorithms. Cambridge University Press, 
Cambridge, 2011. 



17 



